Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ast/opa parse] Support marshalling of all ast location data #5576

Merged
merged 8 commits into from
Mar 9, 2023
Merged

[ast/opa parse] Support marshalling of all ast location data #5576

merged 8 commits into from
Mar 9, 2023

Conversation

charlieegan3
Copy link
Contributor

@charlieegan3 charlieegan3 commented Jan 19, 2023

Fixes #3143

This is being done so that the json representation of an ast parse can
be used more easily by automated tools which need the location
information. This location information is verbose for human users of the
data and has until now been excluded from JSON representation.

This PR makes it possible to configure the JSON marshalling with some
options. However, the functionality of the OPA parse command has been
left mostly unchanged.

The only with the json output is that the fields are in a different
order:

diff --git a/new.json b/old.json
index 5b9504fa..0fb9d4f2 100644
--- a/new.json
+++ b/old.json
@@ -13,15 +13,6 @@
   },
   "rules": [
     {
-      "body": [
-        {
-          "index": 0,
-          "terms": {
-            "type": "boolean",
-            "value": true
-          }
-        }
-      ],
       "head": {
         "name": "allow",
         "value": {
@@ -34,17 +25,26 @@
             "value": "allow"
           }
         ]
-      }
+      },
+      "body": [
+        {
+          "terms": {
+            "type": "boolean",
+            "value": true
+          },
+          "index": 0
+        }
+      ]
     }
   ],
   "comments": [
     {
+      "Text": "IGNvbW1lbnQ=",
       "Location": {
         "file": "main.rego",
         "row": 3,
         "col": 1
-      },
-      "Text": "IGNvbW1lbnQ="
+      }
     }
   ]
 }

It's also possible to now show locations for all terms with another flag
option.

$ go run main.go parse main.rego --format=json --json-include=locations
{
  "package": {
    "location": {
      "file": "main.rego",
      "row": 1,
      "col": 1
    },
    "path": [
      {
        "location": {
          "file": "main.rego",
          "row": 1,
          "col": 9
        },
        "type": "var",
        "value": "data"
      },
...
}

Reviewer Notes:

@netlify
Copy link

netlify bot commented Jan 19, 2023

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 4ea1566
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/63c95f0fb667290008fc35d1
😎 Deploy Preview https://deploy-preview-5576--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

ast/policy.go Outdated Show resolved Hide resolved
ast/annotations.go Outdated Show resolved Hide resolved
@charlieegan3 charlieegan3 marked this pull request as draft January 19, 2023 17:20
@charlieegan3
Copy link
Contributor Author

I have some ideas about how to solve the mutation here and am still working on it.

@netlify
Copy link

netlify bot commented Jan 23, 2023

Deploy Preview for openpolicyagent ready!

Name Link
🔨 Latest commit 2c38a7c
🔍 Latest deploy log https://app.netlify.com/sites/openpolicyagent/deploys/64099e4be7b3e9000820d293
😎 Deploy Preview https://deploy-preview-5576--openpolicyagent.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

data["schemas"] = a.Schemas
}

if len(a.Custom) > 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many checks here to preserve the omit empty functionality.

// data.foo at foo.rego:5 has annotations {"scope":"subpackages","organizations":["Acme Corp."]}
// data.foo.bar at mod:3 has annotations {"scope":"package","description":"A couple of useful rules"}
// data.foo at foo.rego:5 has annotations {"organizations":["Acme Corp."],"scope":"subpackages"}
// data.foo.bar at mod:3 has annotations {"description":"A couple of useful rules","scope":"package"}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new marshalling functions, some of the field orderings have changed.

ast/parser.go Outdated
@@ -176,6 +179,12 @@ func (p *Parser) WithSkipRules(skip bool) *Parser {
return p
}

// WithJSONFields sets the JSON fields that should be exposed in the JSON
func (p *Parser) WithJSONFields(jsonFields map[string]bool) *Parser {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if JSONFields would be better as a custom options type. e.g. ASTJSONMarshalOptions which then each AST struct could read from to determine how to marshal itself...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. as opposed to this being global and applied to all nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have been clearer here. I was considering changing the type of the jsonFields from map[string]bool to a custom struct for this use case, such as ASTJSONMarshalOptions.

ASTJSONMarshalOptions could wrap settings for global options and individual types. E.g. allow the changing of the json marshalling behaviour for a particular type only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... what would be an example use-case where we'd want to change the marshalling? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we might want to have more control over which fields on which types are enabled / disabled.

For example, we might want to turn on locations for every only. This would be marginally easier to retrofit with a custom struct rather than a map[string]bool.

map[string]bool{
  "every_location": true,
}
ASTJSONMarshalOptions{
  EveryLocations: true,
}

Mostly because it'd reduce the number of presence checks in the map and make it easier to make targeted adjustments to certain types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing the map in JSONFields map[string]bool with a custom struct would give us the flexibility to expand the feature. We can always start with the global options and then later add in more fine-grained controls.

Another thought is do we need a map of node field type -> expose or not. We can simply not expose by default (ie. current behavior) and then only do so when explicitly set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks. I think that makes sense to me too. I'll have a go at this now.

do we need a map of node field type -> expose or not.

Perhaps it makes sense to do this now as part of an options wrapper which might grow in future. I'll see what it looks like.

Copy link
Contributor Author

@charlieegan3 charlieegan3 Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a go at this in e15a39b 64b5764, let me know what you think 🙂

return nil
}

func unmarshalLocation(loc *Location, v map[string]interface{}) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this since I needed to expand the functionality of term unmarshalling now that a location might have been included in the output data. I'm not sure if this is correct, perhaps it should not be possible to unmarshal and get the location data back in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only thing that this could affect is where error locations are reported for example, then I don't think it'll be an issue to leave in. If anyone else sees an issue though, I'll defer to them on this.

@charlieegan3 charlieegan3 marked this pull request as ready for review January 23, 2023 17:29
@charlieegan3
Copy link
Contributor Author

Thanks all for the first round of reviews, I've reworked this following a suggestion from Ash which I think is a big improvement. Let me know what you think :)

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlieegan3 this looks good. Few questions to clarify my understanding of the changes.

Term *Term
ExpectedJSON string
}{
"base case": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the base case and location excluded cases are the same ie by default we won't include location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct 🙂 This should be the case for all types.

return &Import{
Path: &term,
Location: NewLocation([]byte{}, "example.rego", 1, 2),
jsonFields: map[string]bool{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The other tests do not include the jsonFields field in the output which is fine I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good spot. It's not needed and is inconsistent. I have removed it in 8e2fe4e

ast/parser.go Outdated
@@ -176,6 +179,12 @@ func (p *Parser) WithSkipRules(skip bool) *Parser {
return p
}

// WithJSONFields sets the JSON fields that should be exposed in the JSON
func (p *Parser) WithJSONFields(jsonFields map[string]bool) *Parser {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. as opposed to this being global and applied to all nodes?

ast/policy.go Outdated
}

// TODO: Comment Location is also always included for legacy reasons. jsonFields data is ignored.
if c.Location != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Is this check needed if the location is also always included ?

Copy link
Contributor Author

@charlieegan3 charlieegan3 Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a good point. In order to leave this functionality as it is I have changed things around in ee24358.

I think it's clear enough to note this inconsistency on the struct and leave out the no-op custom marshal function for now. I have left the tests in however.

cmd/parse.go Outdated
@@ -77,5 +105,7 @@ func parse(args []string, stdout io.Writer, stderr io.Writer) int {

func init() {
parseCommand.Flags().VarP(parseParams.format, "format", "f", "set output format")
parseCommand.Flags().StringVarP(&parseParams.jsonInclude, "json-include", "", "", "select optional elements, current options: locations, comments. E.g. --json-include locations,-comments will include locations and exclude comments.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why comments was included as a --json-include option. Is it because it's easy to remove from the parsed module or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I added it for two reasons:

  • Comments use an inconsistent JSON key name where the fields are capitalised for legacy reasons. For this reason, a use of the command might want to drop them if they are interested in walking the JSON and processing it in an automated way.
  • I wanted to include an additional option beyond locations for this flag so that it was clear how it was to be used.

Clearly the latter reason isn't very strong and I'd be more than happy to remove this functionality. I just thought it make it clearer in the PR how that flag was used.

philipaconrad
philipaconrad previously approved these changes Jan 30, 2023
Copy link
Contributor

@philipaconrad philipaconrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: @charlieegan3, I'd suggest waiting to merge until @ashutosh-narkar has had a chance to review this, but I'm marking approve because this looks solid to me.

LGTM! 🙂 I don't see anything wild or terribly unexpected here, although I have 1-2 small curiosity questions. In particular, I'd like to see whether or not we can get the Comment/comment spelling inconsistency resolved eventually! 😅

}
}

// TODO: Comment has inconsistent JSON field names starting with an upper case letter. Comment Location is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question]: Possibly a silly question, but is there a way for us to signal that the "legacy" capitalization is being deprecated, so that we can eventually change the field naming to the correct version?

Feel free to ignore if this is out-of-scope for this PR!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to do that in a good way, perhaps we could log to stderr but I'm not sure tbh.

ast/parser.go Outdated
@@ -176,6 +179,12 @@ func (p *Parser) WithSkipRules(skip bool) *Parser {
return p
}

// WithJSONFields sets the JSON fields that should be exposed in the JSON
func (p *Parser) WithJSONFields(jsonFields map[string]bool) *Parser {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... what would be an example use-case where we'd want to change the marshalling? 🤔

Comment on lines +2919 to +2926
if generatedRaw, ok := v["generated"]; ok {
if b, ok := generatedRaw.(bool); ok {
expr.Generated = b
} else {
return fmt.Errorf("ast: unable to unmarshal generated field with type: %T (expected true or false)", v["generated"])
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question/curiosity]: What motivated this case? 😃 I'm always happy to see good defensive programming land in core utility functions, but this case is not entirely obvious from its surroundings. Is it caused by generated rules from ast/compiler or somesuch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding on what generated: true means is elementary. However, I think it's that during compilation, some terms are rewritten and replaced with generated terms, perhaps to 'de-sugar' or some other reason.

Why this is here is that in order to round trip marshal and unmarshal an expression, this unmarshal logic was missing. It's not really related to the main functionality of this PR.

return nil
}

func unmarshalLocation(loc *Location, v map[string]interface{}) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only thing that this could affect is where error locations are reported for example, then I don't think it'll be an issue to leave in. If anyone else sees an issue though, I'll defer to them on this.

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlieegan3 this is getting close 😄 Just had few comments around how we define the fields to expose.

// text is not marshalled to JSON so we just drop it in our examples
rule.Location.Text = nil

testCases := map[string]struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the difference between the base and location case.

Also in general can you please help me with what are you testing with the Test*_UnmarshalJSON tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the difference between the base and location case.

The idea here was to show that by default, the locations were not included AND that they were not included when the flag was set to false explicitly. It is perhaps unnecessary!

Also in general can you please help me with what are you testing with the Test*_UnmarshalJSON tests?

Before this PR, a number of types didn't support the unmarshalling of location data found in their JSON representations. As the PR evolved, this become only relevant for the Term and Expr types. However the tests remain for all. We could strip out the unmarshal tests for all types without custom unmarshalling. See the helper unmarshalLocation in the diff for where this is used.

Copy link
Contributor Author

@charlieegan3 charlieegan3 Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the unmarshal tests for all node types other than Term and Expr in 2ce4b93 4997590

ast/parser.go Outdated
@@ -176,6 +179,12 @@ func (p *Parser) WithSkipRules(skip bool) *Parser {
return p
}

// WithJSONFields sets the JSON fields that should be exposed in the JSON
func (p *Parser) WithJSONFields(jsonFields map[string]bool) *Parser {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing the map in JSONFields map[string]bool with a custom struct would give us the flexibility to expand the feature. We can always start with the global options and then later add in more fine-grained controls.

Another thought is do we need a map of node field type -> expose or not. We can simply not expose by default (ie. current behavior) and then only do so when explicitly set.

@charlieegan3
Copy link
Contributor Author

I've rebased this again today. I'm pretty happy with the implementation now, would you be able to have another quick look @ashutosh-narkar?

cc @anderseknert

ashutosh-narkar
ashutosh-narkar previously approved these changes Mar 1, 2023
Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @charlieegan3! Can you please add a test in cmd/parse_test.go ?

@@ -444,6 +455,14 @@ func (term *Term) UnmarshalJSON(bs []byte) error {
return err
}
term.Value = val

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We should update the func desc to something like Specialized unmarshalling is required to handle Value and location

Copy link
Contributor Author

@charlieegan3 charlieegan3 Mar 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, I've added this in fea4dbf

@charlieegan3
Copy link
Contributor Author

charlieegan3 commented Mar 2, 2023

Thanks for the review Ash, I've made some changes to the parse command tests in 2c571c2. I had to make some edits to the parse function to allow the options to be passed in. Hope that all looks good now 🤞

anderseknert
anderseknert previously approved these changes Mar 2, 2023
Copy link
Member

@anderseknert anderseknert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! And much awaited 💯 🚀 👏

One thing I'm curious about, and I don't see any references to it in the PR: the rego.parse_module built-in function. It's an outlier in many ways, and I am most likely its only user 😄 From my understanding of the code here, the output of that built-in function will not change with this. Correct? It'd be useful if that would emit location data too, but I'm not sure if we can have it changed. And we can't really change the function signature to allow for configuration options either. Might have to build a custom function for my needs, which is fine :)

OTOH, I'm not sure that adding data to its AST output would be considered breaking..

"value": {
"location": {
"file": "TEMPDIR/x.rego",
"row": 3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that we get the location of the value, but not of the var. I guess it's easy to deduce that it'll be on the row 0? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed 😞 The reason for this is that Var is 'just' a string and has no other internal state so we'd have no where to store the location data or JSON options to render it. I'm not sure there is a lot we can do here without making some pretty major changes to the way the ast package works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsandall any ideas?

cmd/parse.go Outdated Show resolved Hide resolved
@charlieegan3
Copy link
Contributor Author

output of that built-in function will not change with this. Correct?

Yeah that function doesn't use any options at the moment. The default in this PR is to leave the json marshalling as it is but alter the options available. In the opa parse we set ProcessAnnotation but leave all other parser options off. In the built in no options are set which is arguably inconsistent.

we can't really change the function signature to allow for configuration options

Given this, I think it might be best to use a custom function for our use case 😉

ashutosh-narkar
ashutosh-narkar previously approved these changes Mar 7, 2023
ashutosh-narkar
ashutosh-narkar previously approved these changes Mar 8, 2023
Part of: #3143

This is being done so that the json representation of an ast parse can
be used more easily by automated tools which need the location
information. This location information is verbose for human users of the
data and has until now been excluded from JSON representation.

This PR makes it possible to configure the JSON marshalling with some
options. However, the functionality of the OPA parse command has been
left mostly unchanged.

The only with the json output is that the fields are in a different
order:

```
diff --git a/new.json b/old.json
index 5b9504fa..0fb9d4f2 100644
--- a/new.json
+++ b/old.json
@@ -13,15 +13,6 @@
   },
   "rules": [
     {
-      "body": [
-        {
-          "index": 0,
-          "terms": {
-            "type": "boolean",
-            "value": true
-          }
-        }
-      ],
       "head": {
         "name": "allow",
         "value": {
@@ -34,17 +25,26 @@
             "value": "allow"
           }
         ]
-      }
+      },
+      "body": [
+        {
+          "terms": {
+            "type": "boolean",
+            "value": true
+          },
+          "index": 0
+        }
+      ]
     }
   ],
   "comments": [
     {
+      "Text": "IGNvbW1lbnQ=",
       "Location": {
         "file": "main.rego",
         "row": 3,
         "col": 1
-      },
-      "Text": "IGNvbW1lbnQ="
+      }
     }
   ]
 }
```

It's also possible to now show locations for all terms with another flag
option.

```
$ go run main.go parse main.rego --format=json --json-include=locations
{
  "package": {
    "location": {
      "file": "main.rego",
      "row": 1,
      "col": 1
    },
    "path": [
      {
        "location": {
          "file": "main.rego",
          "row": 1,
          "col": 9
        },
        "type": "var",
        "value": "data"
      },
      {
        "location": {
          "file": "main.rego",
          "row": 1,
          "col": 9
        },
        "type": "string",
        "value": "foo"
      }
    ]
  },
  "rules": [
    {
      "body": [
        {
          "index": 0,
          "terms": {
            "type": "boolean",
            "value": true
          }
        }
      ],
      "head": {
        "name": "allow",
        "value": {
          "location": {
            "file": "main.rego",
            "row": 5,
            "col": 9
          },
          "type": "boolean",
          "value": true
        },
        "ref": [
          {
            "type": "var",
            "value": "allow"
          }
        ]
      }
    }
  ],
  "comments": [
    {
      "Location": {
        "file": "main.rego",
        "row": 3,
        "col": 1
      },
      "Text": "IGNvbW1lbnQ="
    }
  ]
}
```

Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
This change adds a new struct JSONOptions which allows the JSON
marshalling and unmarshalling of ast nodes to be further customized in
the future. It's my hope that this will be more extendable than the
simple map we were using before.

Signed-off-by: Charlie Egan <charlie@styra.com>
Before this PR, a number of types didn't support the unmarshalling of
location data found in their JSON representations. As the PR evolved,
this become only relevant for the Term and Expr types. Other unmarshal
tests are removed in this commit.

Signed-off-by: Charlie Egan <charlie@styra.com>
This adds tests for:
* pretty output
* json output
* json location and comment flags

This required the parse function to be adjusted to make the options
configurable in test cases.

Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Signed-off-by: Charlie Egan <charlie@styra.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opa parse: Allow source location to be provided in output
5 participants